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Abstract 

We investigate the question of tightness of Hnear programming (LP) relaxation for finding a maximum 
weight independent set (MWIS) in sparse random weighted graphs. We show that an edge-based LP 
relaxation is asymptotically tight for Erdos-Renyi graph G(n, cjn) for c < 2e and random regular graph 
G{n,r) for r < 4 when node weights are i.i.d. with exponential distribution of mean 1. We establish 
these results, through a precise relation between the tightness of LP relaxation and convergence of 
the max-product belief propagation algorithm. We believe that this novel method of understanding 
structural properties of combinatorial problems through properties of iterative procedure such as the 
max-product should be of interest in its own right. 



1 Introduction 

The max-weight independent set (MWIS) problem is the following: given a graph with weights on the 
nodes, find the heaviest set of disjoint nodes. It is a canonical combinatorial optimization problem, known 
to be NP-hard [14] and hard to approximate [15] in the worst case. This has led to considerable interest in 
average-case characterizations of fundamental structural properties, as well as the hardness, of the MWIS 
problem. We summarize some of this work in Section 1.1. 

In this paper we establish that, with high probability, the standard simple edge-based linear program- 
ming (LP) relaxation of the MWIS problem is asymptotically tight on 1 — o(l) fraction of the nodes, for 
the Erdos-Renyi graph G{n,c/n) for c < 2e and random regular graph G{n,r) for r < 4, when the node 
weights are drawn i.i.d. with exponential distribution of mean 1. This means that problems from this 
ensemble are "easy" with high probability, since the LP can be solved in polynomial time [10]. 

We arrive at this result via an analysis of the max-product form of belief propagation. In particular, 
we establish the following two properties of max-product: (a) for any arbitrary problem instance, max- 
product succeeds^ for a given node only i/ every LP optimum assignment is integral for that node, and (b) 
for the random weighted graphs above, max-product succeeds for almost all nodes, with high probability. 
To the best of our knowledge, our work represents the first instance where analysis of iterative procedures 
like max-product has been used as a tool to establish fundamental properties of optimization problems on 
graphs; usually, the analysis goes the other way. We believe that this method of analysis has the potential 
to shed insight into other graph-theoretic/algorithmic problems beyond the ones presented in this paper. 



'Laboratory for Information and Decision Systems, MIT. Email: {sanghavi, devavratj@mit.edu 
^That is, the estimate of max-product converges, which may not necessarily correct. 
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Other motivations for our work are: (1) to obtain a better understanding of iterative procedures like 
max-product and its relation to underlying problem structure, a topic of much recent interest (for example, 
[33-37], and (2) to characterize the performance of max-product as a simple, distributed solution to the 
MWIS problem, in applications where such a solution is needed (e.g. wireless scheduling [11]). 

We now summarize some of the most closely related work on average-case analysis of graph problems 
(in particular MWIS), and on the analysis of max-product. Then we summarize our main contributions, 
and provide an outline for the rest of the paper. 

1.1 Related Work 

We now summarize the most relevant existing literature in the two areas concerning this paper: average- 
case analysis of independent set problems, and analysis of max-product and belief propagation. 

There has been much work on average-case characterizations of hard problems, see [16, 17]. For the 
(unweighted) independent set problem, there has been much work on the dense Erdos-Renyi graphs G(n, ^), 
where it is known that the max-size independent set is of size 2 log2 n almost surely. However, no algorithm 
is known to find efficiently an independent set of size significantly larger than log2ra, [16, 18]. Feige [19], 
and Feige and Krauthgamer [20] investigate the tightness of the Lovasz-Schriver hierarchy of relaxations 
for these graphs - relaxation upto logn level will lead to 1 — o(l) approximation. In constrast, here we 
prove tightness of LP relaxation for sparse random graphs. 

There has also been substantial interest in independent set problems on sparse random graphs. Karp 
and Sipser [21] analyze a greedy algorithm for matchings on sparse Erdos-Renyi graphs G{n, c/n); a similar 
analysis for independent set shows that the algorithm works for c < e (see e.g. [8] for details). Frieze and 
Sticu [23] investigate the success of a greedy algorithm for independent sets in random 3-rcgular graphs. 
BoUabas [22] provides an upper bound on the size of the largest independent set for regular graphs. 

There is a tremendous amount of literature on the study of message-passing algorithms for inference. 
We will now briefly review the existing work most directly related to this paper. The main results of this 
paper illustrate a close relationship between the performance of the max-product algorithm, and linear 
programming. Such precise or semantic connections have been noticed in other contexts: in decoding 
of linear codes [4] , weighted matching [1-3] and weighted independent set [5] . For more general inference 
problems, several authors [24-27] develop alternative message-passing algorithms that explicitly solve linear 
programming relaxations. These are related to, but not the same as, the classical max-product belief 
propagation that we will use in this paper. 

1.2 Our Contributions and Paper Organization 

In this paper we investigate the (simple, edge-based) LP relaxation of the MWIS problem for certain 
random ensembles. The weights on the nodes in our random ensembles are drawn from a continuous 
distribution, and hence the LP optimum x* will be unique with probability 1. In general, each node i 
will be assigned a (possibly fractional^) value x* at this optimum. Our main result in this paper is the 
following theorem. 

^In fact, it is known [6, Theorem 64.7] that the edge-based LP is half-integral: a* = 0, 1 or i 
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Theorem 1.1 Let G he an Erdos-Renyi random graph G{n,c/n) for c < 2e, or random regular graph 
G{n,r) for r < 4. Suppose each node has a weight that is chosen to be i.i.d. with exponential distribution 
of mean 1, independent of the graph. Let x* be the optimum of the edge-based LP relaxation of the MWIS 
problem (described in section 2). Let i be a node picked uniformly at random, independent of everything 
else^. Then, given any e, there exists an N^e) such that Pr[a;* = or 1] > 1 — e as long as n> N{e). 

The above theorem states that LP will be tight on a fraction 1 — o(l) of the nodes, with high probability. 
We arrive at this result via the novel route of max-product belief propagation. Max-product is an iterative 
algorithm, and at every iteration produces an estimate f * = 0, 1 or ? (corresponding to "not in the MWIS" , 
"in the MWIS" , and "don't know" respectively) for each node i and time t. In this paper, we prove the 
following two results on the performance of max-product: 

1. For any arbitrary graph with arbitrary weights, consider a node i. If there exists any LP optimum 
X* that puts a fractional mass x* on node then the max-product estimate will be = or ? for 
every odd time t, and a;^ = 1 or ? for every even time t. This is Theorem 3.1. 

2. For the random graph ensembles, and randomly chosen node i as in Theorem 1.1 above, given any 
£ > there exists an N{e) and such that for all n > N(e), the max-product estimate xj remains 
constant and equal to either or 1, for all t > t{e). This is Theorem 4.1. 

Note that the first result (Theorem 3.1) is non-asymptotic: it holds for finite graphs and number of 
iterations (i.e. it is not a "fixed point" analysis). This is crucial in avoiding an "order of limits" problem 
(between the number of iterations and the size of the problem) that may otherwise come up in establishing 
the overall result. The second result is established using ideas from the method of local weak convergence 
[7,8]. 

In addition to the fact that they together immediately imply our main result above, we believe that 
each of the two theorems above arc interesting in their own right. The first theorem generalizes to the case 
when "clique factors" are added to max-product, and clique constraints to the corresponding LP. However, 
in this paper we will concentrate only on the simplest edge-based case. 

The theorems also shed light on the usefulness of max-product as a distributed heuristic for the 
MWIS problem. Consider first an arbitrary graph. In light of Theorem 3.1, we can stop max-product 
after a certain number of iterations and check for one-step agreement: if the estimate xj = x*"*"^ = or 1, 
then we know that x* = 1 for any LP optimum. This is then also consistent with the MWIS (see Lemma 
2.1 below). This means that the set of nodes for which .x* = .x*^^ = 1 will form an independent set; we 
can stop max-product at any time and have a candidate independent set (although it may not be the 
MWIS). If we restrict our attention to the ensembles considered above, if n > N{e) and max-product is 
run for a sufficient number of iterations t > t{£), this set obtained from one-step agreement will be pretty 
large in size. The fact that the weights come from an exponential distribution means that this candidate 
independent set will be a very good approximation of the MWIS. This is shown in Theorem 4.2. 

The rest of the paper is organized as follows. In Section 2 we lay out the groundwork and preliminaries. 
We also describe precisely the max-product algorithm we are considering in this paper, and state some of 
its other known properties. In Section 3 we state and prove Theorem 3.1. In Section 4 we state and prove 
Theorem 4.1, and Theorem 4.2. 

^Or, alternatively, one can think of having an a-priori node numbering before the edges and weights are picked. 
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2 Preliminaries: MWIS, LP relaxation and max-product 



Consider a graph G = (V, E), with a set V of nodes and a set E of edges. Let N{i) = {j &V : G E} 
be the neighbors of z G F. Positive weights G V are associated with each node. A subset of V will 
be represented by vector x = (xj) G {0, 1}'^', where Xj = 1 means i is in the subset Xi = means i is 
not in the subset. A subset x is called an independent set if no two nodes in the subset are connected by 
an edge: {xi,Xj) ^ (1, 1) for all G E. We are interested in finding a maximum weight independent 
set (MWIS) X*. This can be naturally posed as an integer program, denoted below by IP. The linear 
programing relaxation^ of IP is obtained by replacing the integrality constraints Xi G {0,1} with the 
constraints Xi > 0. We will denote the corresponding linear program by LP. The dual of LP is denoted 
below by DUAL. 

n n 

IP: max ''^WiXi over G {0, 1}, LP: max ^^WjXj, over Xj > 0, 

s.t. Xi + Xj <1 for all G E, s.t. Xi + Xj <1 for all {i,j) G E, 

It is well-known that LP can be solved efficiently, and if it has an integral optimal solution then this 
solution is an MWIS of G. If this is the case, we say that there is no integrality gap between LP and IP or 
equivalently that the LP relaxation is tight. We refer an interested reader to book by Schrijver [6] for many 
interesting properties of the LP. We note one property: partial correctness. 



Lemma 2.1 ([6], Corollary 64.9a)) LP optima are partially correct: for any graph, any LP optimum 
X* and any node i, if the mass x* is integral then there exists an MWIS of G for which i 's membership is 
given by x* . 



The classical max-product algorithm is a heuristic that can be used to find the MAP assignment of 
a probability distribution. Before, we state the (simplified) max-product algorithm applied to the MWIS 
problem, we state probability distribution whose MAP solution corresponds to solution of MWIS problem 
for completeness. Now, given an MWIS problem on G = {V,E), associate a binary random variable Xj 
with each i eV and consider the following joint distribution: for x G {0, 1}", 

where Z is the normalization constant. In the above, 1 is the standard indicator function: Itruc = 1 and 
Ifaise = 0- It is easy to see that p(x) = exp - WiXi) if x is an independent set, and ]9(x) = otherwise. 
Thus, any MAP estimate argmaxxp(x) corresponds to a maximum weight independent set of G. 

Here, we present a simplified version of the max-product algorithm (obtained by taking logarithm 
of ratio of messages for the original algorithm) - we refer an interested reader to [5] for details on the 
transformation. The algorithm is iterative; in iteration t each node i sends a message {7*_>j-} to each 
neighbor j G M{i) based on {7^^^}, k G M{i). Each node i also maintains an estimate of its assignment 
in independent set {3^1(7*)} based on messages it received {7*^^}, j G M(i). The following describes the 
message and estimate updates, as well as the final output. 

''other (tighter) LP relaxations of IP are possible, and some of our results carry over to those relaxations as well. However, 
in this paper we will concentrate only on the LP relaxation presented above. 
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Max-product for MWIS 



(0) Initially, t = 1 and jj^^j = for all € E. 

(1) The messages are updated as follows: for i > 1, 



7i^j = I w, 



keM{i)-j 



(2) 



(ii) Estimate max. wt. independent set x(7*) as follows: 



(iv) Update t = t + l; repeat from (i) 



if 
if 
if 



(3) 



2.1 Max-product: known properties 

A popular technique for analyzing max-product is to consider its fixed points [28-30]. Here, we list relevant 
properties of the max-product for MWIS that are established in [5] for setting up context for the results 
stated in Section 3. Note that a set of messages 7* is a fixed point of max-product if, for all G E, 

V keN{i)-3 j + 
The following is a summary of results from [5] . 

Theorem 2.1 [5] There exists at least one fixed point 7* such that "y*^j G [0, lUj] for each G E. 

Given such a fixed point 7*, let x(7*) = {xi{'j*)) be the corresponding estimate. Define y = (yi) € [0, 1]" 
as follows: yi = \ if ^iil*) =?; Vi = otherwise. Then, y corresponds to an extreme point of 

the LP for MWIS. 

Theorem 2.1 implies that the fixed point estimate of max-product for MWIS is an extreme point of LP, 
and hence one that maximizes some weight function consisting of positive node weights. Note however that 
this may not be the true weights Wi. In other words, given any MWIS problem with graph G and weights 
w, each max-product fixed point represents the optimum of the LP relaxation of some MWIS problem on 
the same graph G, but possibly with different weights w. The fact that max-product estimates optimize 
a different weight function means that both eventualities are possible: LP giving the correct answer but 
max-product failing, and vice versa. In [5], two examples are presented for each one of these situations. 
These examples indicate that it may not be possible to resolve the question of relative strength of the two 
procedures based solely on an analysis of the fixed points of max-product. 
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3 Max-product: Convergence and tightness of LP 



The max-product is a deterministic iterative algorithm. It may have multiple fixed points. In which case, it 
converges (if it does) to a fixed point depending upon its starting condition. In absence of prior information, 
the "natural" (and popular) initialization of messages is the one described in Section 2 (i.e. jf^j = 0, for 
all (i,j) G E. In this section, we directly analyze the performance of max-product algorithm with these 
initial conditions. Wc show that the resulting estimates are very exactly characterized by optima of the 
true LP, at every time instant (not just at fixed points). This implies that, if a fixed point is reached, it 
will exactly reflect an optimum of LP. Our main theorem for this section is stated below. 

Theorem 3.1 Given any MWIS problem on weighted graph G, suppose max-product is started from the 
initial condition 7 = 0. Then, for any node i E G. 

1. If there exists any optimum x* of LP for which the mass assigned to edge i satisfies x* > 0, then 
the max-product estimate Xi{'y*) is 1 or ? for all odd times t. 

2. If there exists any optimum x* of LP for which the mass assigned to i satisfies x* < 1, then the 
max-product estimate Xi(7*) is or 9 for all even times t. 

An important and direct consequence of the above result is as follows. 

Corollary 3.1 //LP has non-integral optima, i.e. there is i such that x* G (0,1). Then, estimate Xi{-y*) 
either oscillates or converges to ?. 

The proof of this theorem relies on the computation tree interpretation of max-product estimates. We 
now specify this interpretation for our problem, and then prove Theorem 3.1. 

3.1 Computation Tree for MWIS 

The proof of Theorem 3.1 relies on the well-known computation tree interpretation [28, 32] of the loopy max- 
product estimates. In this section we briefly outline this interpretation. For any node i, the computation 
tree at time t, denoted by Ti{t), is defined recursively as follows: Ti{l) is just the node i. This is the root 
of the tree, and in this case is also its only leaf. The tree Ti{t) at time t is generated from Ti{t — 1) by 
adding to each leaf of Ti(t — 1) a copy of each of its neighbors in G, except for the one neighbor that is 
already present inTi{t — 1). Each node in Tj is a copy of a node in G, and the weights of the nodes in Tj 
are the same as the corresponding nodes in G. As an example. Figure 1 presents computation tree Ta(4) 
for the node a in graph G at time i = 4. 

Lemma 3.1 For any node i at time t, (a) a;j(7*) = 1 if and only if the root ofTi{t) is a member o/ every 
MWIS on Ti{t); (b) Xi{^*) = if and only if the root ofTi(t) is not a member o/any MWIS on Ti(t); and 
(c) Xi{'y^) =? else. 

Thus the max-product estimates correspond to max-weight independent sets on the computation trees 
Ti{t), as opposed to on the original graph G. 
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Figure 1: An example of computation tree for t = 4 iterations for a 4 node graph. 
3.2 Proof of Theorem 3.1 

We now prove Theorem 3.1. We will present the proof of part (1). The proof of part (2) follows from very 
similar arguments and hence we will skip it. 

Wc will prove part 1 through contradiction. To this end, suppose part 1 is not true. That is, there 
exists node i, an optimum x* of LP with x* > 0, and an odd time t at which the estimate is xj = 0. For 
brevity, in the remainder of the proof we will use the notation xj = Xi{'y*) for the estimates. Let Ti(t) be 
the corresponding computation tree. Using Lemma 3.1 this means that the root i is not a member of any 
MWIS of Ti{t). Let / be some MWIS on Ti{t). We now define the following set of nodes 

/* = |j G Ti{t) : j ^ I, and copy of j in G has x* > 0} 

In words, /* is the set of nodes in Ti(t) which are not in I, and whose copies in G are assigned strictly 
positive mass by the LP optimum x*. 

Note that by assumption the root i & I* and i ^ I. Now, from the root, recursively build a maximal 
alternating subtree S as follows: first add root i, which is in /* — I. Then add all neighbors of i that are 
in / — 7*. Then add all their neighbors in I* — I, and so on. The building of S stops either when it hits 
the bottom level of the tree, or when no more nodes can be added while still maintaining the alternating 
structure. Note the following properties of S: 

• S is the disjoint union of (5 D /) and (5 n /*). 

• For every j G S* n /, all its neighbors in /* are included in S H I*. Similarly for every j € S Ci I* , all 
its neighbors in I are included in S Ci I. 

• Any edge (j, k) in Ti{t) has at most one endpoint in (5 fl /), and at most one in (5 n /*). 
We now state a lemma, which we will prove later. The proof uses the fact that t is odd. 

Lemma 3.2 The weights satisfy w{S n 7) < w{S D I*). 

We now use this lemma to prove the theorem. Consider the set I' which changes I by flipping S: 

I' = i-{sni) + {snr) 

We first show that I' is also an independent set on Ti{t). This means that we need to show that every edge 
{j, k) in Ti{t) touches at most one node in There are thus three possible scenarios for edge (j, k): 
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• j,k ^ S. In this case, membership of j, k in /' is the same as in I, which is an independent set. So 
{j, k) has at most one node touching /'. 

• One node j e S Cl I. In this case, j ^ I', and hence again at most one of j, k belongs to I'. 

• One node k £ S D I* but other node j ^ S D I. This means that j ^ /, because every neighbor of k 
in I should be included in S HI. This means that j ^ I', and hence only node k E I' for edge {j, k). 

Thus /' is an independent set on Tiit). Also, by Lemma 3.2, we have that 

w{l') > w{I) 

However, I is an MWIS, and hence it follows that /' is also an MWIS of Ti{t). However, by construction, 
root i G which violates the fact that Xi{t) = 0. The contradiction is thus established, and part (1) of 
the theorem is proved. ■ 

3.2.1 Remaining proofs 

The proof of this lemma involves a perturbation argument on the LP. For each node j G G, let mj denote 
the number of times j appears in S* n / and nj the number of times it appears in 5 fl /*. For e > 0, define 

X = x*+e{m — n). (5) 

We now show state a lemma that is proved immediately following this one. 

Lemma 3.3 x is a feasible point for LP, for small enough e. 

Using the above, we will complete the proof of Lemma 3.2. Since x* is an optimum of LP, it follows that 
w'x < w'x* , and so w'm < w'n. However, by definition, w'm = w{S n /) and w'n = w(S fl /*). This 
finishes the proof of Lemma 3.2. ■ 

Proof of Lemma 3.3: Now, we complete proof of the only remaining part: Lemma 3.3. We wish to show 
that X as defined in (5) is a feasible point for LP, for small enough e > 0. To do so we have to check node 

constraints xj > and edge constraints xj + x^ < 1 for every edge (j, k) € G. Consider first the node 
constraints. Clearly we only need to check them for any j which has a copy j € /* H S. If this is so, then 
by the definition (3.2) of I* , Xj > 0. Thus, for any mj and nj, making e small enough can ensure that 
Xj + e{mj — rij) > 0. 

Before we proceed to checking the edge constraints, we make two observations. Note that for any node 
j in the tree, j E S H I then 

• x'j < 1, i.e. the mass Xj put on j by the LP optimum x* is strictly less than 1. This is because of the 
alternating way in which the tree is constructed: a node j in the tree is included in 5 fl / only if the 
parent p of j is in 5n/* (note that the root i € S'n/* by assumption). However, from the definition 
of /*, this means that x* > 0, i.e. the parent has positive mass at the LP optimum x*. This means 
that < 1, as having Xj = l would mean that the edge constraint x* + < 1 is violated. 
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• j is not a leaf of the tree. This is because S alternates between I and /*, and starts with I* at the 
root in level 1 (which is odd). Hence 5 fl I will occupy even levels of the tree, but the tree has odd 
depth (by assumption t is odd). 

Now consider the edge constraints. For any edge {j,k), if the LP optimum x* is such that the constraint 
is loose - i.e. if x* + < 1 - then making e small enough will ensure that Xj + Xj^ < 1. So we only need 
to check the edge constraints which are tight at x*. 

For edges with x* + = 1, every time any copy of one of the nodes j or k is included in S* H /, the 
other node is included in S D I* . This is because of the following: if j is included in S D I, and k is its 
parent, we are done since this means k € S Ci I*. So suppose k is not the parent of j. From the above it 
follows that j is not a leaf of the tree, and hence k will be one of its children. Also, from above, the mass 
on j satisfies x* < 1. However, by assumption x^ + xj^ = 1, and hence the mass on A; is xj^ > 0. This means 
that the child k has to be included in S Hi*. 

It is now easy to see that the edge constraints are satisfied: for every edge constraint which is tight 
at X*, every time the mass on one of the endpoints is increased by e (because of that node appearing in 
iS n /), the mass on the other endpoint is decreased by e (because it appears S Ci I*). ■ 

4 Max-product for Random Weighted Graphs 

In this section, we establish the correctness and convergence of max-product algorithm when underlying 
graph is random and node weights are chosen as per exponential distribution. Specifically, we consider two 
types of sparse random graphs, G{n,c/n) and G{n,r): 

1. The G{n,c/n) has n nodes. An edge is present between any node-pair i,j with probability c/n 
independently. Thus, on average c{n — l)/2 edges are present. 

2. The G(n, r) has n nodes. It is formed by sampling one of the r-regular n node graph uniformly at 
random. 

In either of these two cases, we assign node weight randomly. Specifically, let Wi denote the (random) 
weight of node i. Then, Wj are independent and identically distributed with exponential distribution of 
mean 1. That is, for any C > 0) 

Pr(Wi >0 = exp(-C). 

4.1 Results 

Convergence of max-product. First, we establish that for 1 — o(l) fraction of nodes, the algorithm 
converges after finitely many iterations. Formally, we state the result as follows. 

Theorem 4.1 Consider graph G{n,c/n) or G{n,r) with node weights assigned independently according to 
exponential distribution of rate 1. Let c < 2e and r < 4. Then, for any e > 0, there exists large enough 
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A'^(e) and T{e) such that if n > N{e), then following holds: for any node in G{n,c/n) or G{n,r), say i, 
the xl{= a;j(7*)) converges to the correct value, x| with probabilit'i/' at least 1 — e for t > Tie). 

Correctness of max-product. Theorem 4.1 implies that almost all nodes converge to the correct 
solution. However, questions that remain: (a) how to identify these 'converged' nodes? and (b) do they 
have 1 — o(l) fraction of weight of the MWIS? Indeed, we answer both of these questions in affirmative. 

We state the following simple stopping condition under which we will establish that all converged 
nodes get assigned the correct values, while all other nodes get assigned values 0. As we shall establish, 
thus resulting assignment is indeed an independent set with 1 — o(l) fraction of weight of MWIS with high 
probability. Now the stopping condition and its approximation property. 

Stopping condition. At the end of iteration t, generate an estimate /* using x* as {z G F : x* = = !}• 

Theorem 4.2 Under the setup of Theorem 4-1, let algorithm stops after t > T(£) steps producing x*. Let 
P be the independent set obtained from x* as per the stopping condition described above. Then, n > Ni^e) 
with large enough Ni{£), the weight of T, W{P) is such that 



where 6(e) = 0(£log(l/£)) — >■ as £ — > 0. Here, I* is the maximum weight independent set and W(I*) is 
its weight. 

4.1.1 Outline of proof 

We now present a brief outline of the proof, and then present the details in appendix due to space con- 
straints. Our results use the method of local weak convergence [7], and specifically the results of Gamarnik, 
Nowicki and Swirszcz [8]. Under the random graph models considered here, i.e. random regular or Erdos- 
Renyi graph, for almost all nodes of the graph their local neighborhood looks like a tree (see Lemma A.l). 
Now, under random selection of weights the assignment of node values under MWIS is determined by the 
local neighborhood for almost all nodes of the graph (see Lemma A. 3) - this property is also known as the 
correlation decay property. Now, max-product produces the correct estimate for each node with respect to 
its computation tree. The computation tree of a node is equal to the local neighborhood as long as the local 
neighborhood is tree (recall Lemma 3.1). Therefore, it is likely that for almost all nodes the max-product 
will produce the correct estimate after finitely many iterations. In what follows, we will make this precise by 
overcoming important technical subtlities. The correctness of stopping condition would follow from certain 
"anti-monotonicity" property of the max-product estimate procedure. The good approximation property 
of the resulting estimate would follow from certain extremality properties of Exponential distribution (see 
Lemma A.4). 

^Here, the probability distribution is induced by the choice of random graph and weights. 
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A Proofs of Theorems 4.1 and 4.2 



In this appendix, we present proofs of the two remaining Theorems 4.1 and 4.2. Wc start with some useful 
known properties. Then, we study certain "local convergence" properties of max-product on random graphs 
models considered in this paper. Building upon these, we conclude the proof of Theorem 4.1. Finally, using 
Theorem 4.1 and an extremal property of Exponential random variables wc conclude the proof of Theorem 
4.2. We note that in this proof we assume that algorithm starts at time t = and not t = 1 for a peculiar 
notational reason. Clearly, it is of no non-trivial relevance. 

A.l Useful properties 

Here, we describe useful definitions, notations and properties for proving Theorem 4.1. To this end, consider 
a fixed node i in graph G. Define, Vi{t) C V as 

Vi{t) = {j £ V : there is a path between i and j of length no more than t}. 

Let Ei{t) (Z E he set of edges incident between these vertices and Gi{t) = {Vi{t), Ei{t)) be the subgraph of 
G thus created. As defined earlier, let Ti{t) be the computation tree of node i till iteration t. Then, it is 
straightforward that Tj(t) = Gi{t) (in terms of graph structure only) if Gi{t) is itself a tree. 

Some notation and definitions before stating result about structural properties of Gi{t) when G = 
G{n,c/n) or G = G{n,r). A Poisson tree of depth t and parameter c, denoted as T{c,t), is constructed as 
follows: starting with root, say 0, add Poisson(c) number of children to it. Recursively, for each of thus 
created children, add Poisson(c) number of children independently till the tree has depth t. A regular tree 
of depth t and parameter r, denoted as T(r, t), is constructed as follows: starting with root, say 0, add r 
number of children to it. Recursively, for each of thus created children, add r — 1 number of children till 
the tree has depth t. 

Now, we state the following well-known (and very important for us) property about the local structure 
of G{n,c/n) and G{n,r) (see [12] and [13] respectively for details). 

Lemma A.l Consider graph G = G{n, c/n) or G = G{n, r) with finite values of c and r. Consider a fixed 
node i (numbering of nodes is done prior to selection of edges). Then, as n ^ oo, 

(a) For G = G{n,c/n), the Gi{t) converges (in distribution) to the Poisson tree, T{c,t); 

(b) For G = G{n,r), the Gi{t) converges (in distribution) to the regular tree, T{r,t). 

A. 2 Local convergence of mcix-product 

Consider the max-product algorithm running at a particular node, say i. In what follows, i is always used to 
denote a fixed node^. We wish to understand the evolution its estimate xj, which depends on the messages 

®This fixed node i is chosen a priori selection of random graph structure and weights or equivalently, its selection is done 
uniformly at random from n nodes for the purposes of the proofs. 
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7 • Specifically, as we stated earlier, the messages 7 can be defined recursively on the computation tree 
Ti{t) as follows: node k generates message for its parent, say p{k), using messages from its children set, 
say c{k) and its weight W/- as 

7Lp(fc) = max lo,Wk- T^fe • 
V eec{k) J 

When, c{k) = that is A; is a leaf node of Ti{t), then due to initial condition, 

7Lp(fe) = niax(0,^fc) = Wk, 

since > with probability 1. Call the 7* obtained with this initial conditions as 7*(0). 

Now, suppose initial condition at a leaf node k, that is at depth t of the computation tree, be Wk 
instead of 0. Equivalently, if incoming messages to k (at the depth t) summed upto > Wk, then 

iUpik) = ^^^io^Wk-Wk) = 0. 

Call the 7* obtained along all the edges of computation tree Tj(t) with such initial condition as 7*(W): 
that is, all leaf nodes at level t of Ti{t) have initial condition equal to their node weights, while leaf nodes^ 
at level < t have the usual initial condition 0. 

Finally, consider starting algorithm with initial condition Lj. G [0, Wk] at leaf node k at depth t in 
Ti{t). Then, the messages from leaf nodes, say k of Ti{t), is 

< 7Lp(fc) = max (0, Wk - Lk) < Wk. 

Let us denote the message obtained by starting with initial condition L (vector of initial condition values 
for leaf nodes at depth t of Ti{t)) as 7*(L). 

The above discussion implies the following: for t = 1, for any j which is children of i and any starting 
condition < L < W (component- wise inequality), 

ij^m > 7*^.(L) > 7i^.(W). 

This non-increasing behavior of messages received at root node as initial condition is increasing holds true 
for all odd t inductively (can be easily verified). 

Lemma A. 2 Consider an odd t, fixed node i and its computation tree Ti(t). Then, for any non-negative 
starting condition (for leaf nodes at depth < L < W (component-wise) and any children j of root node 
i, 

lU(^) > lU(^) > 7j^^(W). 

Next, we consider the estimation of node i based on its messages at time t. For convenience, we define 
notion of bonus at node i at time t, denote as Bj as follows: 

jeMii) 

'^Such leaf nodes can exists for Poisson tree, but will not exist for a regular tree. 
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As per max-product algorithm, the estimation = 1 if > 0; = if < and ? otherwise. Let 
Bj(0), BjCL) and i?|(W) denote bonus values when algorithm is started with initial conditions < L < W 
for leaf nodes of Tj(i) respectively. Prom Lemma A. 2 and definition of bonus, it follows that for any odd t, 

BiiO) < Bi{L) < BiiW). (6) 

Due to very similar reasoning, it also follows that the bonuses have anti-monotone property starting with 
initial condition. That is, for any odd t 

BfiO) < Bl+\0). (7) 
The following result is direct adaption of [8, Theorem 8]. It will be essential to complete our result. 

Lemma A. 3 Consider a fixed (f) G [0, oo) and initial conditions < L < W for leaf nodes at depth t of 
Ti(t). Let e > be given. Then, there exists an odd t{£) large enough such that the following holds: for 
t = t{e), 

(a) Let, Gi{t) {and hence Ti{t)) be (distributed as) Poisson tree T{c,t) with c < 2e. Then, 

Pr (S*(0) <(/))< Pr (5|(L) < 0) < Pr (5|(W) < 0) < Pr (S*(0) <</>)+£. 

(b) Let, Gi{t) {and hence Ti{t)) be (distributed as) regular tree T(r, t) with r < 4. Then, 

Pr (S|(0) <(/))< Pr (5*(L) < 0) < Pr (S*(W) < 0) < Pr (S*(0) <</))+£. 

A. 3 Proof of Theorem 4.1: putting things together 

Now we complete the proof of Theorem 4.1 using Lemmas A.l, A. 2 and A. 3 along with some properties of 
the algorithm. We state proof for G{n,c/n) with c < 2e and G{n,r) with r < 4 simultaneously. 

Consider a fixed node i of G, where G = G{n, c/n),c < 2e, or G = G{n, r), r < 4. Let e > be given. 
Let El be the event that the local neighborhood of depth t of node i, Gi{t) is tree. We will assume some 
odd t = t{e) or t(e) + 1 as required per Lemma A. 3. As part of the algorithm, the bonus values at node 
i, i?*(0) can be computed recursively starting with initial condition at the leaf nodes (at depth t) of its 
computation tree Ti{t) as described before. Now, consider the bonus value Bj'^^{0) at the next step. Due 
to the update equation of our algorithm and computation tree structure, Ti{t) C Tj(t + 1), it is easy to 
see that Bj~^^{0) is equal to Bj{'L) for some initial condition L, < L < W, for leaf nodes at depth t of 
Ti{t). This is because irrespective of incoming messages, the message generated along any edge is always 
between [0, VF], where W is the node weight. Similarly, we can inductively argue that B?{0) is equal to 
B^Lg) for some < < W for all s > t. Therefore, using Lemma A. 2 and its consequence (6), for any 
s>t, 

Bi{0) < Sf(0) < Bi{W). (8) 

Suppose, that Ei is true. Then, Gi{t) = Ti{t) and the Wi is independent of messages 7* based on 
computation tree Ti{t). Therefore, Bj{0) 7^ and 5|(W) 7^ with probability 1 due to Wi being drawn 
from a continuous distribution. Therefore, by definition of xf based on Bf{0) and (8), it follows that 
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under event E2 = {Bj{0) > 0}, the x| = 1 for all s > t. Consider event = {Bj(W) < 0}. Given Ei, 
E3 = {i?|(W) < 0} since i?|(W) 7^ with probability 1. Therefore, using similar reasoning we have that 
x| = for all s >t. Prom this discussion, it follows that for s > i 

Pr (x| converges) > Pr((^2 U ^3) n Ei) = PrG,(t) ((^2 U ^3) n ^1), (9) 

where in the last equality we introduce the sub-script Gi{t) as the probability of event {E2 U E^) n Ei 
primarily depends only on local neighborhood of depth t, Gi (t) . Here, by Pr^ . we mean the distribution 
induced by the random graph G{n, c/n) or G{n, r) and the random node weights on the local neighborhood 
of node i upto depth t. Inspired by Lemma A.l, consider the distribution induced by Poisson tree T(c, f) or 
regular tree T(r, t) (depending upon random graph model of G) and the independent random node weights 
- denote this by Q(-). 

Now, the sequence (dependent on number of nodes n in G) of distributions Pr(3.(j)(-) converges to Q(-) 
as per Lemma A.l. This convergence is defined over appropriate topology of local weak convergence (see, 
[31] for example). Now, it can be checked that {E2 U £^3) fl Ei is an open set. Therefore, by Portmanteau 
theorem it follows that 

lim inf PrG.(t)((£^2 U E^) fl E^) > <^{{E2 U Eg) n Ei). (10) 

n— >oo ^ ' 

Equivalently, for selection of N{£) large enough we have that when n > N{e) 

FTG,(t){{E2^E3)nEi) > Q((^2 u ^3) n El) - £. (11) 

Therefore, in order to establish that xf converges for s > t with enough probability it is sufficient to show 
that 

q{{E2 U £^3) nEi)>l-e. 

Now consider the following. 

q{{E2\JE3)nEi) q{E2UE3) 



(6) 



Q{E2) + QiE3nE^) 
Q{E2) + Q{E3) 



Q {Bj{0) > 0) + Q {Bj{W) < 0) 
= 1-Q(B*(0) <0) +Q(i?*(W) <0) 

> (12) 

where (a) follows from fact that under Q the Ei is always true; (b) follows from - E^ = {i3|(W) < 0} C 
{Bj{0) < 0} = E2; (c) follows from definition and (12) follows from Lemma A. 3 (for either T{c,t) or 
T{r,t)). Thus, (9), (10) and (12) imply that xf converges with probability at least 1 — 2£ for n > N{£) 
and s >t = t{e). 

To prove the correctness of x\ upon convergence, note the following: by Theorem 3.1 and it follows 
that if x\ has converged to or 1 then the corresponding LP optimum solution must have those assignment 
for node i. By Lemma 2.1 it follows that these are the assignment from the MWIS. Therefore, if Xi{'y^) 
converges (or in fact is equal to 0, or 1, for two subsequent time steps) then Theorem 3.1 states that 
Xi(7^) = the value of the unique LP optimum for i and Lemma 2.1 then implies that this estimate 
is actually correct for G. Therefore, we have proved that the converged values is the correct value. This 
completes the proof of Theorem 4.1 by re-selection e = e/2 and selection appropriate values for N{e) and 
T{e). 



16 



A. 4 Proof of Theorem 4.2 



We need to establish that (a) the stopping condition induces an independent set, /* and (b) the r has 
high enough weight. To this end, recall that if = x\ = l then by Theorem 3.1 it must be that the LP 
optimum assignment has x* = 1 and by Lemma 2.1 it is indeed equal to the assignment as per the MWIS. 
Since /* contains only such nodes, it follows that /* C /* and hence a valid independent set. 

Now, by Theorem 4.1 it follows that for graph with n > N{e) and number of iterations t > T{e), 
at least 1 — e fraction of nodes find their right assignment with probability at least 1 — e. Therefore, it 
follows that < en with probability at least 1 — e. Now, consider the following extremal property of 

Exponential random variables. 



Lemma A.4 Consider n i.i.d. random variables Xi, . . . ,X„ with Exponential distribution of mean 1. Let 
the ordered sequence be > • • • > -X^7r(n)- Then, for any e G (0, e~^°) 



/ en 

\ n ^-^ 



Pr - 2^ > 2£(1 + ln(l/£)) < exp {-ne ln(l/£)) . 



Proof: The proof follows by an application of Cramer's Theorem for Exponential random variables. 
Specifically, given N i.i.d. random variables Yi,. . . ,Yn with mean 1 and Exponential distribution, for any 
L > 10 it follows that 



Pr 



(^^E^^^^j ^ exp(-iV(L-l-logL)) 



< exp(-A^L/2). (13) 

Therefore, by application of (13) it follows that for any collection of en of Xi, . . . their summation is 
no larger than neL with probability at least 1 — exp(— neL/2) for L > 10. 

Now, given n random variables there are (J^) distinct ways to select en indices. Therefore, by an 
application of union bound, above discussion and Striling's approximation it follows that 



^'(lp.,>>eL'j < (;„) 



exp 



neL\ 



2 J 

< exp(neln(l/e) +n£-n£L/2). (14) 
Select L = L{e) = max{2(l + ln(l/£)), 10}. Then, we obtain that 



1/e)). (15) 



Note that for e < e"^", L{e) = 2(1 + ln(l/e)) > 10. This completes the proof of Lemma A.4. ■ 

From Lemma A.4, it follows that for n large enough (say larger than Ni{e)) the the net weight of 
nodes in |/*\/*| is at most 7(e)n with probability at least 1 — e for 7(e) = 0(eln(l/£)) as £ ^ 0. Clearly, 
7(e) — >■ as £ — > 0. As established in [8], the weight of the maximum weight independent set is 0(n) with 
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probability 1 — o(l) under the setup of our interest. It follows (using union bound) that for A^i(£) large 
enough, for n > Ni{e), 

with d{e) = 0(elog(l/e)). This completes the proof of Theorem 4.2. 
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